Introduction to grammar of graphics

Erik Fredner

2024-08-30

Why visualize data well?

What are key considerations for visualizing data well?

  • Keep it simple! (KISS principle)
  • Don’t cherry-pick
    • Represent data truthfully
  • Make accessible visualizations
    • e.g., color palettes like viridis work for colorblind people

What is the grammar of graphics?

Origin

Key gg concepts

Data visualizations (simple or complex) are composed of layers. Each layer consists of three parts:

Key Description
data Tabular dataset associated with the layer
geom Graphical element associated with each observation
aes Mappings from properties of the plot that associate features in the dataset with elements of the geometry

Example data set: food

food <- read_csv("../data/food.csv")

food |>
  select(item, food_group, calories, carbs)
# A tibble: 61 × 4
   item        food_group calories carbs
   <chr>       <chr>         <dbl> <dbl>
 1 Apple       fruit            52 13.8 
 2 Asparagus   vegetable        20  3.88
 3 Avocado     fruit           160  8.53
 4 Banana      fruit            89 22.8 
 5 Chickpea    grains          180 30.0 
 6 String Bean vegetable        31  7.13
 7 Beef        meat            288  0   
 8 Bell Pepper vegetable        26  6.03
 9 Crab        fish             87  0.04
10 Broccoli    vegetable        34  6.64
# ℹ 51 more rows

Example scatter plot

Observations represented by dots:

food <- read_csv("../data/food.csv")

food |>
  ggplot() +
  # note that this is geom_point:
  geom_point(aes(x = calories, y = carbs))

Example text plot

Observations represented by the item label:

food |>
  ggplot() +
  # note that this is geom_text:
  geom_text(aes(x = calories, y = carbs, label = item))

Example complex bar plot

Observations represented by bars:

Code
food |>
  # filter for high cholesterol foods:
  filter(cholesterol > 50) |>
  ggplot() +
  # set bar chart
  geom_col(aes(
    # sort bars by descending amount of cholesterol
    x = reorder(item, -cholesterol),
    y = cholesterol,
    # set bar color (fill) by food_group
    fill = food_group
  ))

Syntax review

With ggplot, you can combine multiple layers to create simple or complex data visualizations. In general terms, the structure is:

data |>
  ggplot() +
  geom_...(aes(x = ..., y = ...)) +
  ...

Fixed aesthetics in ggplot

Why assign fixed aesthetics?

  • Sometimes you want to set aesthetics that are not tied to the data.
  • For example, you might want to set the color of all points to be green.

Example scatter plot with fixed aesthetics

Code
food |>
  ggplot() +
  geom_point(
    aes(
      x = calories,
      y = carbs,
      size = 10
    ),
    # note that color=green goes outside of aes()
    # because it applies to all points:
    color = "green"
  )

Summary

  • ggplot2 implements the “grammar of graphics.”
  • Visualizations are composed of layers, which can be added with +.
  • Each layer has at least three parts: data, geom, and aes.
  • Functions discussed:
    • ggplot(): Initializes a plot object.
    • geom_point(): Creates scatter plots.
    • geom_text(): Displays text labels.
    • geom_col(): Creates bar plots using values in data.
  • Best practices:
    • Keep it simple.
    • Keep it truthful.
    • Keep it accessible.